code and pre-trained model checkpoint
code and pre-trained model checkpoints upon paper acceptance. 2 To Reviewer 1, 3, and 4 Q1: How do perturbations in embedding space compare to those in pixel/token space?
We thank all the reviewers for their insightful and encouraging comments. T o Reviewer 1, 3, and 4 Q1: How do perturbations in embedding space compare to those in pixel/token space? Unlike pixels, tokens are discrete in nature. Q2: What happens if adversarial perturbations are simultaneously added to both image and text domains? Q3: Do the adversarial perturbations make the model more robust to adversarial attacks and paraphrases?